AITopics | distributional data

Collaborating Authors

distributional data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Kernel K-means clustering of distributional data

Baíllo, Amparo, Berrendero, Jose R., Sánchez-Signorini, Martín

arXiv.org Machine LearningSep-23-2025

We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$. Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing kernel Hilbert space (RKHS) $\mathcal H$. By mapping each distribution to its corresponding kernel mean embedding in $\mathcal H$, we obtain a sample in this RKHS where we carry out the $K$-means clustering procedure, which provides an unsupervised classification of the original sample. The procedure is simple and computationally feasible even for dimension $p>1$. The simulation studies provide insight into the choice of the kernel and its tuning parameter. The performance of the proposed clustering procedure is illustrated on a collection of Synthetic Aperture Radar (SAR) images.

distributional data, kernel, procedure, (17 more...)

arXiv.org Machine Learning

2509.18037

Country: Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data

Katta, Srikar, Parikh, Harsh, Rudin, Cynthia, Volfovsky, Alexander

arXiv.org Artificial IntelligenceDec-16-2023

Many modern causal questions ask how treatments affect complex outcomes that are measured using wearable devices and sensors. Current analysis approaches require summarizing these data into scalar statistics (e.g., the mean), but these summaries can be misleading. For example, disparate distributions can have the same means, variances, and other statistics. Researchers can overcome the loss of information by instead representing the data as distributions. We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making: Analyzing Distributional Data via Matching After Learning to Stretch (ADD MALTS). We (i) provide analytical guarantees of the correctness of our estimation strategy, (ii) demonstrate via simulation that ADD MALTS outperforms other distributional data analysis methods at estimating treatment effects, and (iii) illustrate ADD MALTS' ability to verify whether there is enough cohesion between treatment and control units within subpopulations to trustworthily estimate treatment effects. We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.

malt, quantile function, treatment effect, (14 more...)

arXiv.org Artificial Intelligence

2312.10569

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

Nonlinear Sufficient Dimension Reduction for Distribution-on-Distribution Regression

Zhang, Qi, Li, Bing, Xue, Lingzhou

arXiv.org Machine LearningApr-24-2023

We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data, modeled as members of a metric space. Our key step is to build universal kernels (cc-universal) on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we construct the universal kernel using the Wasserstein distance, while for multivariate distributions, we resort to the sliced Wasserstein distance. The sliced Wasserstein distance ensures that the metric space possesses similar topological properties to the Wasserstein space while also offering significant computation benefits. Numerical results based on synthetic data show that our method outperforms possible competing methods. The method is also applied to several data sets, including fertility and mortality data and Calgary temperature data.

artificial intelligence, machine learning, predictor, (17 more...)

arXiv.org Machine Learning

2207.04613

Country:

North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.24)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Closed pattern mining of interval data and distributional data

Soldano, Henry, Santini, Guillaume, Zevio, Stella

arXiv.org Artificial IntelligenceDec-9-2022

We discuss pattern languages for closed pattern mining and learning of interval data and distributional data. We first introduce pattern languages relying on pairs of intersection-based constraints or pairs of inclusion based constraints, or both, applied to intervals. We discuss the encoding of such interval patterns as itemsets thus allowing to use closed itemsets mining and formal concept analysis programs. We experiment these languages on clustering and supervised learning tasks. Then we show how to extend the approach to address distributional data.

data mining, machine learning, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2212.04849

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.73)

Add feedback

Variational Wasserstein Barycenters for Geometric Clustering

Mi, Liang, Yu, Tianshu, Bento, Jose, Zhang, Wen, Li, Baoxin, Wang, Yalin

arXiv.org Machine LearningFeb-24-2020

We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps with variational principle. We discuss the metric properties of WBs and explore their connections, especially the connections of Monge WBs, to K-means clustering and co-clustering. We also discuss the feasibility of Monge WBs on unbalanced measures and spherical domains. We propose two new problems -- regularized K-means and Wasserstein barycenter compression. We demonstrate the use of VWBs in solving these clustering-related problems.

barycenter, optimal transport, variational wasserstein barycenter, (13 more...)

arXiv.org Machine Learning

2002.10543

Country: North America > United States > Arizona (0.04)

Genre:

Overview (0.66)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations

Lengerich, Benjamin J., Maas, Andrew L., Potts, Christopher

arXiv.org Machine LearningOct-25-2017

Knowledge graphs are a versatile framework to encode richly structured data relationships, but it not always apparent how to combine these with existing entity representations. Methods for retrofitting pre-trained entity representations to the structure of a knowledge graph typically assume that entities are embedded in a connected space and that relations imply similarity. However, useful knowledge graphs often contain diverse entities and relations (with potentially disjoint underlying corpora) which do not accord with these assumptions. To overcome these limitations, we present Functional Retrofitting, a framework that generalizes current retrofitting methods by explicitly modeling pairwise relations. Our framework can directly incorporate a variety of pairwise penalty functions previously developed for knowledge graph completion. We present both linear and neural instantiations of the framework. Functional Retrofitting significantly outperforms existing retrofitting methods on complex knowledge graphs and loses no accuracy on simpler graphs (in which relations do imply similarity). Finally, we demonstrate the utility of the framework by predicting new drug--disease treatment pairs in a large, complex health knowledge graph.

artificial intelligence, graph, knowledge graph, (13 more...)

arXiv.org Machine Learning

1708.00112

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Dermatology (0.47)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)

Add feedback

Fuzzy clustering of distribution-valued data using adaptive L2 Wasserstein distances

Irpino, Antonio, De Carvalho, Francisco, Verde, Rosanna

arXiv.org Machine LearningMay-2-2016

Distributional (or distribution-valued) data are a new type of data arising from several sources and are considered as realizations of distributional variables. A new set of fuzzy c-means algorithms for data described by distributional variables is proposed. The algorithms use the $L2$ Wasserstein distance between distributions as dissimilarity measures. Beside the extension of the fuzzy c-means algorithm for distributional data, and considering a decomposition of the squared $L2$ Wasserstein distance, we propose a set of algorithms using different automatic way to compute the weights associated with the variables as well as with their components, globally or cluster-wise. The relevance weights are computed in the clustering process introducing product-to-one constraints. The relevance weights induce adaptive distances expressing the importance of each variable or of each component in the clustering process, acting also as a variable selection method in clustering. We have tested the proposed algorithms on artificial and real-world data. Results confirm that the proposed methods are able to better take into account the cluster structure of the data with respect to the standard fuzzy c-means, with non-adaptive distances.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1605.00513

Country:

Asia > Azerbaijan (0.04)
South America > Brazil > Pernambuco > Recife (0.04)
Oceania > Solomon Islands (0.04)
(43 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback